专利摘要:
The present invention relates to performing a cache-based trace recording using cache coherence protocol (CCP) data. The modalities detect that an operation that causes an interaction between a cache line and a backup storage has occurred, that the record is enabled for the processing unit that caused the operation, that the cache line is a participant in the record, and that the CCP indicates that there is data to be recorded in a trace. The modalities then cause the data to be recorded in the trace, the data of which is usable to reproduce the operation.
公开号:BR112020003342A2
申请号:R112020003342-1
申请日:2018-06-22
公开日:2020-08-18
发明作者:Jordi Mola
申请人:Microsoft Technology Licensing, Llc;
IPC主号:
专利说明:

[001] [001] When writing code while developing software applications, developers commonly spend a significant amount of time "debugging" the code to find runtime and other source code errors. By doing this, developers can adopt several proposals to reproduce and locate a source code error, such as observing the behavior of a program based on different inputs, inserting a debug code (for example, to print values variables, to track execution branches, etc.), temporarily removing portions of code, etc. Tracking runtime errors to identify code errors can take up a significant portion of application development time.
[002] [002] Many types of debugging applications ("debuggers") have been developed in order to assist developers with the code debugging process. These tools provide developers with the ability to track, view, and change the code execution of the computer. For example, debuggers can view the execution of code instructions, can display variable values of code at various times during code execution, can allow developers to change code execution paths, and / or allow developers to determine "breakpoints" and / or "observation points" about elements of code interest (which, when reached during execution, cause code execution to be suspended), Among other things.
[003] [003] An emerging form of debugging applications allows "time travel", "reverse" or "historical" debugging. With "time travel" debugging, the execution of a program (for example, executable entities such as threads) is recorded / tracked by a tracking application in one or more tracking files. These tracking file (s) can then be used to reproduce the execution of the program later, for analysis both forward and backward. For example, "time travel" debuggers can allow a developer to determine breakpoints / forward observation points (like conventional debuggers), as well as breakpoints / reverse observation points. BRIEF SUMMARY
[004] [004] The modalities here improve "time travel" debugging recordings, using a shared processor cache, along with its cache coherence protocol (CCP), in order to determine what data should be recorded in a trace file. Doing so can reduce the size of the trace file by several orders of magnitude when compared to previous proposals, thereby significantly reducing the excess of recording traces.
[005] [005] In some embodiments, they are implemented in computing environments that include (i) a plurality of processing units, and (ii) a cache memory that comprises a plurality of cache lines that are used to put ca - checks data from one or more support stores and is shared by the plurality of processing units. The consistency between data in the plurality of cache lines and the one or more supporting stores is managed according to a cache coherence protocol.
[006] [006] These modalities include performing a track recording
[007] [007] This summary is provided to introduce a selection of concepts in a simplified form which are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject, nor is it intended to be used as an aid in determining the scope of the claimed subject. BRIEF DESCRIPTION OF THE DRAWINGS
[008] [008] In order to describe a way in which the advantages and features mentioned above and others of the invention can be obtained, a more specific description of the invention briefly described above will be created by reference to its specific modalities which are illustrated in the attached drawings. Understanding that these drawings present only typical modalities of the invention and therefore should not be considered limiting its scope, the invention will be described and explained with specificity and additional details through the use of accompanying drawings in which:
[009] [009] Figure 1 illustrates an exemplary computing environment that facilitates the recording of "accurate bit" traces of code execution through shared caches using protocol data.
[0010] [0010] Figure 2 illustrates an example of a shared cache;
[0011] [0011] Figure 3 illustrates a flowchart of an exemplary method for performing a cache-based tracking recording using CCP data;
[0012] [0012] Figure 4A illustrates an exemplary shared cache that extends each of its cache lines with one or more additional count bits;
[0013] [0013] Figure 4B illustrates an example of a shared cache that includes one or more cache lines reserved for storing count bits that apply to conventional cache lines;
[0014] [0014] Figure 5 illustrates an example of associative cache mappings;
[0015] [0015] Figure 6A illustrates a table showing an exemplary reading and writing activity by four processing units on a single line in a shared cache;
[0016] [0016] Figure 6B illustrates a table showing an exemplary crawled cache coherence state based on the read and write activity shown in Figure 6A;
[0017] [0017] Figure 6C illustrates a table showing exemplary data stored in count bits (that is, unit bits, index bits, and / or signal bits) from a shared cache based on the read and write activity shown in Figure 6A;
[0018] [0018] Figure 6D illustrates a table showing exemplary data records that could be written in tracking files in connection with the reading and writing activity shown in Figure 6A;
[0019] [0019] Figure 7A illustrates an example in which some read -> read transitions could be omitted from a scan depending on how processors are scanned;
[0020] [0020] Figure 7B illustrates an example of data recording that omits the read -> read transitions highlighted in Figure 7A;
[0021] [0021] Figure 7C illustrates a table showing exemplary data records that could be recorded if "index bits" are used and indexes are updated in the readings;
[0022] [0022] Figure 8A illustrates an exemplary computing environment that includes two processors, each including four processing units, and L1-L3 caches;
[0023] [0023] Figure 8B illustrates a table showing exemplary reading and writing operations performed by some of the processing units in Figure 8A;
[0024] [0024] Figure 9A illustrates a table showing exemplary readings and writings by two processing units;
[0025] [0025] Figure 9B illustrates an example of a table that compares when log entries could be made an environment that provides CCP unit information plus a cache line signaling bit, versus an environment that provides CCP index information one more bit of cache line signaling;
[0026] [0026] Figure 10A illustrates an example of different parts of a memory address, and their relationship to associative caches; and
[0027] [0027] Figure 10B illustrates an example of record cache errors and cache removals in an associative cache. DETAILED DESCRIPTION
[0028] [0028] The modalities here improve the "time travel" debugging recordings, using a shared processor cache, along with its cache coherence protocol, in order to determine what data should be recorded in a trace file . Doing this can reduce the size of the tracking file by several orders of magnitude when compared to previous proposals, thereby significantly reducing the ex-
[0029] [0029] Figure 1 illustrates an exemplary computing environment 100 that facilitates the recording of "accurate bit bits" of execution code through shared caches using cache coherence protocol data. As shown, the modalities may comprise or use a special-purpose or general-purpose computer system 101 that includes computer hardware, such as, for example, one or more processor (s) 102, system memory 103, one or more data stores 104, and / or input / output hardware 105.
[0030] [0030] The modalities within the scope of the present invention include physical and other computer-readable means to execute or store instructions and / or data structures executable by computer. Such a computer-readable medium can be any available medium that can be accessed by the computer system 101. Computer-readable media that store instructions and / or data structures executable by computer are device storage devices. computer. Computer-readable media that carry instructions and / or data structures executable by a computer are transmission media. Thus, by way of example, and not limitation, the embodiments of the invention can comprise at least two distinctly different types of computer-readable media: computer storage devices and transmission media.
[0031] [0031] Computer storage devices are physical hardware devices that store instructions and / or structures of executable data per computer. Computer storage devices include various computer hardware, such as RAM, ROM, EEPROM, solid state drives ("SSDs"), instant memory, phase change memory ("PCM"), storage
[0032] [0032] The transmission medium can include a network and / or data connection which can be used to load a program code in the form of instructions or data structures executable by a computer, and which can be accessed by the computer system 101. A "network" is defined as one or more data connections that allow the transport of electronic data between computer systems and / or modules and / or other electronic devices. When information is transferred or provided over a network or other communications connection (either wired, wireless, or a combination of wired or wireless) to a computer system, the computer system can view the connection as a means of transmission. Combinations of the above must also be included within the scope of computer-readable medium. For example, input / output hardware 105 may comprise hardware (for example, a network interface module (for example, a "NIC")) that connects to a network and / or data connection which can be used used to load program code in the form of instructions or data structures executable by computer.
[0033] [0033] In addition, when reaching various computer system components, the program code in the form of instructions or executable data structures can be transferred automatically from the transmission media to computer storage devices (or vice versa). For example, computer executable instructions or data structures received over a network or data connection can be temporarily stored in RAM within an NIC (for example, input / output hardware 105), and then eventually transferred for system memory 103 and / or for less volatile computer storage devices (eg, data storage 104) in computer system 101. Thus, it should be understood that computer storage devices can be included in computer system components that also (or even primarily) use the means of transmission.
[0034] [0034] Computer executable instructions comprise, for example, instructions and data which, when executed on processor (s) 102, cause computer system 101 to perform a certain function or group of functions. Computer executable instructions can be, for example, binary, instructions in intermediate format such as assembly language, or even source code.
[0035] [0035] Those skilled in the art will appreciate that the invention can be practiced in network computing environments with many types of computer system configurations, including, personal computers, desktop computers, laptop computers, messaging processors, devices laptops, multi-processor systems, microprocessor-based or programmable consumer electronics, network PCs, minicomputers, mainframe computers, mobile phones, PDAs, tablets, pagers, routers,
[0036] [0036] Those skilled in the art will also appreciate that the invention can be practiced in a cloud computing environment. Cloud computing environments can be distributed, although this is not required. When distributed, cloud computing environments can be distributed internationally within one organization and / or have components owned across multiple organizations. In this description and in the following claims, "cloud computing" is defined as a model to allow network access on demand to a shared grouping of configurable computing resources (for example, networks, servers, storage, applications and / or services). The definition of "cloud computing" is not limited to any of the other numerous advantages that can be gained from such a model when properly implemented.
[0037] [0037] A cloud computing model can be composed of several characteristics, such as self-service on demand, broad network access, group of resources, fast elasticity, measured service, and so on. A cloud computing model can also come in the form of several service models, such as, for example, Software as a Service ("SaaS"), Platform as a
[0038] [0038] Some modalities, such as a cloud computing environment, may comprise a system that includes one or more hosts that are each capable of running one or more virtual machines. During operation, virtual machines emulate an operating computing system, supporting an operating system and perhaps one or more other applications as well. In some embodiments, each host includes a hypervisor that emulates virtual resources for virtual machines using physical resources that are abstracted from the view of the virtual machines. The hypervisor also provides proper isolation between virtual machines. Thus, from the perspective of any given virtual machine, the hypervisor provides the illusion that the virtual machine is interfacing with a physical resource, even though the virtual machine only interfaces with the appearance (for example, a virtual resource) of a physical resource. Examples of physical resources that include processing power, memory, disk space, network bandwidth, media drives, and so on.
[0039] [0039] As illustrated, data storage 104 can store instructions and / or data structures executable by a computer representing application programs such as, for example, a tracker 104a, an operating system core 104b, and application 104c (for example, the application that is the object of tracking by tracker 104a, and one or more tracking file (s) 104d). When these programs are running (for example, using processor (s) 102), system memory 103 can store corresponding runtime data, as
[0040] [0040] Tracker 104a is usable to record an accurate trace of the execution bits of an application such as application 104c, and to store trace data in trace file (s) 104d. In some embodiments, tracker 104a is a standalone application, while in other embodiments, tracker 104a is integrated into another software component, such as the operating system core 104b, a hypervisor, a cloud fabric, etc. Although tracking file (s) 104d are shown as stored in data store 104, tracking file (s) 104d can also be written exclusively or temporarily to system memory 103, or some other storage device.
[0041] [0041] Figure 1 includes a simplified representation of the internal hardware components of processor (s) 102. As shown, each processor 102 includes a plurality of processing units 102a. Each processing unit can be physical (ie, a physical processor core) and / or logical (ie, a logical core presented by a physical core that supports hyperlinking, in which more than one application thread runs in a physical core). Thus, for example, even though processor 102 may in some embodiments include only a single physical processing unit (core), it could include two or more logical processing units 102a presented by that single physical processing unit.
[0042] [0042] Each processing unit 102a executes processor instructions that are defined by applications (for example, tracker 104a, operating core 104b, application 104c, etc.), and whose instructions are selected from an instruction set architecture predefined processor (ISA). The specific ISA for each processor 102 varies based on the processor manufacturer and processor model. Common ISAs include INTEL, INC. IA-64 and IA-32 architectures, ADVANCED MICRO DEVI- CES, INC. MD64 architecture, and several ARM HOLDINGS, PLC Advanced RISC architectures, despite a large number of other ISAs exist and can be used by the present invention. In general, an "instruction" is the smallest unit of externally visible code (that is, external to the processor) that is executable by a processor.
[0043] [0043] Each processing unit 102a obtains processor instructions from a shared cache 102b, and executes processor instructions based on data in shared cache 102b, based on data in records 102d, and / or without data from en - track. In general, shared cache 102b is a small amount (that is, small compared to the typical amount of system memory 103) of random access memory that stores copies in the processor portion of a backup storage, such as the system memory 103 and / or another cache. For example, when executing application code 103a, shared cache 102b contains portions of application runtime data 103b. If processing unit (s) 102a require data not yet stored in shared cache 102b, then "cache error" occurs, and this data will be fetched from system memory 103 (potentially "removing" some other data from the cache shared 102b).
[0044] [0044] Typically, a shared cache 102b comprises a plurality of "cache lines", each of which stores a piece of memory from the backing store. For example, Figure 2 illustrates an example of at least a portion of a shared cache 200, which includes a plurality of cache lines 203, each of which includes an address portion 201 and a value portion 202. The portion address 201 of each line 203 can store an address in the backing store (for example, system memory 103) to which the line corresponds, and the value portion 202 can initially store a re- support storage. The value portion 202 can be modified by the processing units 102a, and eventually removed back into the backing storage. As indicated by the ellipses, a shared cache 200 can include a large number of cache lines. For example, a contemporary IN-TEL processor may contain a layer 1 cache that comprises 512 or more lines of cache. In this cache, each cache line is typically usable to store a 64-byte (512-bit) value in reference to an 8-byte (64-bit) memory address.
[0045] [0045] The address stored in the address portion 201 of each cache line 203 can be a physical address, such as the actual memory address in system memory 103. Alternatively, the address stored in the address portion 201 of each cache line 203 can be a virtual address, which is an address that is assigned to the physical address to provide an abstraction. Such abstractions can be used, for example, to facilitate memory isolation between different processes that run on processor (s) 102. When virtual addresses are used, a processor 102 may include a lookaside store (TLB) 102f (usually part of a memory management unit (MMU)), which maintains mappings between the physical and virtual memory address.
[0046] [0046] A shared cache 102b can include a portion of code cache and a portion of data cache. For example, when executing application code 103a, the shared cache code portion 102b stores at least a portion of the processor instructions stored in application code 103 and the shared cache data portion 102b stores at least a portion of data structures of application runtime data 103b. Often, a processor cache is divided into separate rows / layers (e.g., layer 1 (L1), layer 2 (L2), and layer 3 (L3)), with some rows (e.g., L3) potentially existing separate from ( s) processor (s) 102. Thus, shared cache 102b may comprise one of these layers (L1), or may comprise a plurality of these layers.
[0047] [0047] When multiple layers of cache are used, the processing unit (s) 102a interact directly with the lowest layer (L1). In most cases, data flows between layers (for example, when reading from an L3 cache it interacts with system memory 103 and serves data to an L2 cache, and the L2 cache in turn serves data to the cache L1). When a processing unit 102a needs to perform a write, the caches coordinate to ensure that those caches that affected data that were shared between the processing unit (s) 102a no longer have them. This coordination is performed using a cache coherence protocol (discussed later).
[0048] [0048] Caches can be inclusive, exclusive, or include both inclusive and exclusive behaviors. For example, in an inclusive cache an L3 layer would store a superset of the data in the L2 layer below it, and the L2 layers store a superset of the L1 layers below it. In exclusive caches, layers can be disunited - for example, if data exists in an L3 cache that an L1 cache needs, they can exchange information, such as data, address and the like.
[0049] [0049] Each processing unit 102 also includes a microcode 102c, which comprises a control logic (ie executable instructions) that control the operation of processor 102, and which generally functions as an interpreter between the processor hardware and the processor ISA exposed by processor 102 to run applications. Microcode 102 can be embedded in processor storage, such as ROM, EEPROM, etc.
[0050] [0050] Records 102d are storage locations based on hardware that are defined based on the ISA of the processor (s) 102 and that are read from and / or written by the processor instructions. For example, 102d records are commonly used to store fetched values from shared cache 102b for use by instructions, to store the results of execution instructions, and / or to store status or status - as well as some of the side effects of execute instructions (for example, the sign of a changing value, a value reaching zero, the occurrence of a load, etc.), a count of processor cycles, etc. Thus, some 102d registers may comprise "flags" that are used to signal some state change caused by the execution of processor instructions. In some embodiment, processors 102 may also include control registers, which are used to control different aspects of processor operation.
[0051] [0051] In some embodiments, processor (s) 102 may include one or more temporary stores 102e. As will be discussed hereinafter, temporary storage (s) 102e can be used as a temporary storage location for tracking data. Thus, for example, the processor (s) 102 can store portions of tracking data in the temporary storage (s) 102e, and download this data to the file (s) ( s) 104d tracking at appropriate times, such as when memory bus bandwidth is available. In some implementations, the temporary storage (s) 102e could be part of the shared cache 102b.
[0052] [0052] As mentioned above, processors that have a shared cache 102b operate the cache according to a cache coherence protocol ("CCP"). Specifically, CCPs define how consistency is maintained between data in shared cache 102b and supporting data storage (for example, system memory 103 or another cache) as the various processing units 102a read and read. write data to shared cache 102b, and how to ensure that the various process units 102a always read valid data from a given location in share cache 102b. CCPs are typically related to and allow a memory model defined by the ISA of the processor
[0053] [0053] Examples of common CCPs include the MSI protocol (that is, Modified, Shared and Invalid), the MESI protocol (that is, Modified, Exclusive, Shared and Invalid) and the MOESI protocol (that is, Modified, Owned, Exclusive , Shared and Invalid). Each of these protocols defines a state for individual locations (for example, lines) in the shared cache 102b. A "modified" cache location contains data that has been modified in shared cache 102b, and is therefore potentially inconsistent with the corresponding data in the backing store (for example, system memory 103 or another cache). When a location that has a "modified" state is removed from shared cache 102b, common CCPs require the cache to ensure that its data is written back to the backing storage, or that another cache assumes this responsibility. A "shared" cache location contains data that is not modified from the data in the backing store, exists in a read-only state, and is shared by processing unit (s) 102a. Shared cache 102b can remove this data without writing it to backup storage. An "invalid" cache location does not contain valid data, and can be considered empty and usable for storing cache error data. A "unique" cache location contains data that matches backup storage, and is used only by a single processing unit 102a. This can be changed to the "shared" state at any time (that is, in response to a read request) or it can be changed to the "modified" state when writing to it. A "owned" cache location is shared by two or more processing units 102a, but one of the processing units has an exclusive right to make changes to these. When this processing makes changes, it directly or directly notifies the other processing units - as the notified processing units may need to invalidate or update based on the CCP implementation.
[0054] [0054] The granularity with which different CCPs track cache coherence and make this cache coherence data available to tracker 104a can vary. For example, at one end of the spectrum, some CCPs track cache coherence by cache line as well as by processing unit. These
[0055] [0055] The modalities use the shared cache of processor 102b to efficiently record an accurate trace of the execution bits of an application 104c and / or of the operation system core 104b. These modalities are built on an observation that processor 102 (including shared cache 102b) forms a semi or almost closed system. For example, since portions of data for a process (ie code data and runtime application data) are loaded into shared cache 102b, processor 102 can be run on its own - without any input - like a semi or almost closed system for gusts of time. Specifically, one or more of the processing units 102a execute instructions from the code portion of the shared cache 102b, using runtime data.
[0056] [0056] When a processing unit 102a needs some information flow (for example, because an instruction it is executing, will execute, or can execute access code or runtime data that are not yet in the shared cache 102b), a "cache error" occurs and this information is brought into the shared cache 102b of system memory 103. For example, if an error in the data cache occurs when an executed instruction performs a memory operation on a memory address within application runtime data 103b, data from that memory address is brought into one of the cache lines of the data portion of shared cache 102b. Similarly, if an error in the code cache when an instruction performs a memory operation on a memory address application code 103a stored in system memory 103, the code from that memory address is brought to one of the cache lines. of the code portion of the shared cache 102b. Processing unit 102a then continues to run using the new information in shared cache 102b until new information is brought back into shared cache 102b (for example, due to another cache error or a non-cached read) .
[0057] [0057] The inventor observed that, in order to record an accurate representation in bits of execution of an application, tracker 104a can record enough data to be able to reproduce the inflow of information to the shared cache 102b during the execution of the (s) chaining (s) of that application). A first proposal for doing this is to write all data brought to the shared cache 102b by recording all cache errors and non-cached readings (that is, readings from hardware components and non-cached memory), along with a time during the execution in which each piece of data was brought into the shared cache 102b (for example, using a count of executed instructions or some other counter).
[0058] [0058] A second proposal - which results in trace files significantly smaller than the first proposal - is to trace and write the cache lines that were "consumed" by each 102a processing unit. As used here, a processing unit "consumed" a line of cache when it is aware of its present value. This could be because the processing unit is one that wrote the present value of the cache line, or because the processing unit performed a read on the cache line. This second proposal involves extensions to shared cache 102b that allow processor 102 to identify, for each cache line, one or more processing units 102a that consumed the cache line.
[0059] [0059] According to the modalities here, a third proposal is to use the processor's CCP to determine a subset of "consumed" cache lines to write to the 104d file (s), which will still allow the activity from shared cache 102b is reproduced. This third proposal results in significantly smaller tracking files - and thus significantly lower tracking excesses - than both the first and the second proposals.
[0060] [0060] Some modalities here record tracking data streams that correspond to processing units / threads. For example, the tracking file (s) 104 could include one or more separate tracking data streams for each processing unit. In these modalities, the data packets in each tracking data stream may not have identification of the processing unit to which the data packets apply, since this information is inherent based on the tracking data stream itself. In these modalities, if the computer system 101 includes multiple processors 102 (that is, within different processor sockets), the tracking file (s) could have one or more different ras data streams. - tracking for each processing unit 102a on different processors 102. Plural data streams could even be used for a single thread. For example, some modalities could associate a data stream with a processing unit used by a thread, and associate one or more additional data streams with each shared cache used by the thread.
[0061] [0061] In other embodiments, the tracking file (s) 104 could include a single stream of tracking data to processor 102, and could identify in each data packet which processing unit the data packet is in. applies. In these modalities, if the computer system 101 includes multiple processors 102, the tracking file (s) 104 could include a separate tracking data stream for each of the multiple processors 102. Regardless of the layout of the (s) tracking file (s), data packets for each processing unit 102a are generally recorded independently from other processing units, allowing different threads that ran on different processing units 102a to be played independently . The tracking files can, however, include some information - whether expressed or inherent - that provide a partial ordering between the different threads.
[0062] [0062] Figure 3 illustrates a flowchart of a method 300 to perform a gaseous trace recording in cache using CCP data. Method 300 may include acts that are performed by processor 102 as tracker 104a tracks application 104c and / or the core of operating system 104b. The actions taken by processor 102 may be based on logic encoded in processor 102, logic encoded in software (i.e., microcode 102c) and / or another software application such as tracker 104a, the operating system core 104b, or a hypervisor. Although Figure 3 illustrates a sequence of acts, it will be appreciated that modalities could perform many of these acts in any order, with some even being performed in parallel. As such, the sequence of acts shown in method 300 is non-limiting.
[0063] [0063] As shown, method 300 includes an act 301 of detecting an interaction between a cache and a backup store. In some embodiments, act 301 comprises detecting an operation that causes an interaction between a specific cache line of a plurality of cache lines and one or more supporting stores. For example, while executing a thread of application 104c or of the operating system core 104b on one of the processing units 102a, the processing unit may cause an interaction between a line in the shared cache 102b and a backing store (for example, example, system memory 103, or another cache). Detection can be performed, for example, by processor 102 based on the execution of its microcode 102c.
[0064] [0064] Method 300 also includes an act 302 of identifying a processing unit that caused the interaction. In some modalities, act 302 comprises identifying a processing unit specific to the plurality of processing units that caused the operation. For example, based on running microcode 102c, processor 102 can identify which of the pro-
[0065] [0065] Method 300 also includes an act 303 of determining whether a record is enabled for the processing unit. In some embodiments, act 303 comprises using one or more register control bits to determine that the register is enabled for the specific processing unit. For example, processor 102 can determine whether the processing unit identified in act 302 has registration enabled, based on one or more registration control bits. The use of register control bit (s) allows the registration of different processing units to be dynamically enabled and disabled. Thus, using record control bit (s), tracker 104a can dynamically control which thread (s) are being tracked, and / or which execution portion (s) of different threads are being tracked.
[0066] [0066] The specific form and function of the registration control bit (s) may vary. In some embodiments, for example, the register control bit (s) is / are part of one of the 102d registers, such as a control register. In these embodiments, a single register control bit could correspond to a processing unit 102a or a plurality of processing units 102a. Thus, a 102d register includes a single register control bit (for example, which corresponds to all processing units or a specific processing unit or subset of processing units), or could potentially include a plurality of register control bits (for example, each corresponding to one or more processing units). In other modes, the register control bit (s) comprise, or are otherwise associated with, an address space identifier (ASID) and / or a process context identifier (PCID) that color-
[0067] [0067] Method 300 also includes an act 304 of determining whether a cache line participates in registration. In some embodiments, Act 304 comprises, based on at least the record being enabled for the specific processing unit, determining whether the specific cache line is a participant in the record. For example, processor 102 can determine whether the cache line involved in the operation detected in act 301 is involved in registration. As will be discussed in more detail later, there are several mechanisms that can be used for detection, such as using bits within the shared cache 102b, or using cache mode locking.
[0068] [0068] Method 300 also includes act 305 that uses a CCP to identify that there is data to be recorded in a trace. For example, processor 102 can consult its CCP to determine which transitions in the cache state occurred as a result of the operation, and whether these transitions guarantee records.
[0069] [0069] Method 300 also includes an act 306 of recording appropriate data for tracking using a CCP. In some embodiments, act 306 comprises making the data recorded for tracking, the data usable to reproduce the operation. When data must be recorded for the trace file (s), one or more data packets can be added to the appropriate trace data streams - such as a trace data stream corresponding to the unit of pro - specific termination, or a stream of tracking data that corresponds to processor 102 generally. If the appropriate trace data stream corresponds to processor 102 generally, the one or more data packets can identify the specific processing unit. Note that if the tracking data stream corresponds to processor 102 generally, the inherent order of the data packets in the data stream itself provides some additional sorting information that may not be available if multiple data streams are used .
[0070] [0070] It is noted that when shared cache 102b comprises multiple levels of cache, in some modalities method 300 operates at the level of cache that interacts with system memory 103, since it is this level of cache that processes errors cache. Operating at this level allows the cache activity of each processing unit 102a to be represented, without being redundant (i.e., representing the activity of a unit more than once). So, for example, if computer system 101 includes two processors 102 (that is, two processor sockets) and comprises an "inclusive" L3 cache per socket, as well as "inclusive" L2 caches
[0071] [0071] As mentioned above in connection with act 304, there are several mechanisms that can be used by processor 102 to determine whether a cache line is a "participant in registration". Each line of the shared cache 102b should be extended with one or more additional "count bits" that can be used as a flag, as processing unit identifiers, or as a processor index. The logic for controlling these "count bits" can be part of the processor microcode 102c.
[0072] [0072] To illustrate this modality, Figure 4A illustrates an exemplary shared cache 400a, similar to the compartment cache 200 of Figure 2, which extends each of its cache lines 404 with one or more additional count bits 401. Thus, each cache line 404 includes count bit (s) 401, conventional address bits 402, and value bits 403.
[0073] [0073] In some implementations, the count bit (s) of each cache line 401 comprise a single bit that functions as a flag (ie, on or off) used by processor 102 to indicate whether or not the cache line is participating in the registration record. If the processor CCP has sufficient granularity (for example, if the CCP tracks the state of coherence for
[0074] [0074] In other implementations, the line count bit (s) of line 401 includes a plurality of bits. Bit pluralities could be used in several ways. Using a proposal, referred to herein as "unit bits", the count bit (s) of each cache line 401 may include a number of unit bits equal to a number of processing units 102a of processor 102 ( for example, the number of logical processing units if processor 102 supports hyperlinking or the number of physical processing units if hyperlinking is not supported). These unit bits can be used by processor 102 to track which one or more specific processing units consumed the cache line (or if the cache line was not consumed, to note that none of the processing units consumed it. ). Thus, for example, a shared cache 102b that is shared by two processing units 102a could include two unit bits for each cache line. In connection with these unit bits added to each cache line, the modalities extend microcode 102c from the processor to use these unit bits to track whether or not the current value in the cache line has been recorded (that is, in the file tracking 104d) on behalf of each processing unit, or is otherwise known for the processing unit. If the processor's CCP has a coarser granularity (for example, if the CCP tracks the state of coherence at the cache line level only), these unit bits can provide additional information to facilitate robust tracking. For example, if a cache line is marked as shared or exclusive by the CCP, the unit bits can be used to identify which processing unit (s) share the cache line or which processing unit has the exclusivity.
[0075] [0075] Using another proposal, referred to here as "index bits", the count bits of each cache line 401 may include a number of index bits sufficient to represent an index for each of the processing units 102a of the (s) processor (s) 102 of the computer system 101 participating in the registry, together with a "reserved" value (for example, -1). For example, if processor (s) 102 of computer system 101 include 128 processing units 102a, these processing units can be identified by an index value (for example, 0- 127) using only seven index bits per cache line. In some embodiments, an index value is reserved (for example, "invalid") to indicate that no processor has registered a cache line. Thus, this would mean that the seven index bits would actually be able to represent 127 processing units 102a, plus the reserved value. For example, binary values 0000000 - 1111110 could correspond to index locations 0-126 (decimal), and binary value 1111111 (for example, -1 or 127 decimal, depending on the interpretation) could correspond to " invalid ", to indicate that no processor has registered the corresponding cache line - although this notation may vary, depending on the implementation. Thus, the unit bits can be used by processor 102 to track whether the cache line is participating in the tracking record (for example, a value other than -1), and as an index for a processing unit that consumed the cache line (for example, the processing unit that most recently consumed it). This second proposal has the advantage of supporting a large number of processing units with little excess in the shared cache 102b, with the disadvantage of less granularity than the first proposal (that is, only one processing unit is identified) at a time). Again, if the processor CCP has a coarser granularity (for example, if the CCP tracks a state of coherence at the line level only), these index bits can provide additional information to facilitate a robust tracking. For example, if a cache line is marked as shared or exclusive by the CCP, the index bits can be used to identify at least one processing unit that shares the cache line, or which processing unit has the exclusivity.
[0076] [0076] Another mechanism that can be used by processor 102 to determine if a cache line is a registered participant can employ the concepts discussed in connection with Figure 4A, but without extending each cache line with bit ( s) additional counting bit count 401. Instead, this mechanism reserves one or more of the 404 cache lines to store count bits. Figure 4B illustrates an example of a shared cache 400b that includes conventional cache lines 405 that store memory addresses 402 and values 403, as well as one or more reserved cache lines 406 for storing count bits that apply to the lines of conventional cache 405. The bits of the reserved cache line (s) 406 are allocated in different groups of count bits that each correspond to a different one from the conventional cache lines 405. These groups of count bits could function as a signaling bit, unit bits, or index bits, depending on the implementation.
[0077] [0077] Another mechanism that can be used by the processor (s) 102 to determine if a cache line is a participant in registration is to use associative caches and mode locking. As a shared processor cache 102b is generally much smaller than system memory 103 (often in order of magnitude), and so there are usually far more memory locations in system memory 103 than there are lines in shared cache 102b . As such, each processor defines a mechanism for mapping multiple memory locations from system memory to line (s) in a cache. Processors generally employ one of two general techniques: direct mapping and associative mapping. Using direct mapping, different memory locations in system memory 103 are mapped to only one line in the cache, so that each memory location can only be cached on a specific line in the cache.
[0078] [0078] Using associative mapping, on the other hand, different locations in system memory 103 can be cached for one of multiple lines in shared cache 102b. Figure 5 illustrates an example 500 of associative cache mappings. Here, cache lines 504 from a cache 502 are logically partitioned into different address groups of two cache lines each, including a first group of two cache lines 504a and 504b (identified as index 0), and a second group of addresses of two cache lines 504c and 504d (identified as index 1). Each cache line in an address group is associated with a different "mode", so that cache line 504a is identified by index 0, mode 0, cache line 504b is identified by index 0, mode 1, and so on. As yet shown, memory locations 503a, 503c, 503e and 503g (memory indexes 0,
[0079] [0079] Associative caches are generally referred to as N mode associative caches, where N is the number of "modes" in each group of addresses. Thus, the cache 500 in Figure 5 could be referred to as a mode 2 associative cache. Processors commonly implement N mode caches where N is a power of two (for example, 2, 4, 8, etc.), with N values of 4 and 8 being commonly chosen (although the modalities here are not limited to any specific N value or subsets of N values). Notably, an associative cache of mode 1 is generally equivalent to a direct mapped cache, since each group of addresses contains only one line of cache. In addition, if N is equal to the number of lines in the cache, it is referred to as a fully associative cache, since it comprises a single group of addresses that contains all the lines in the cache. In fully associative caches any memory location can be cached for any line in the cache.
[0080] [0080] It is noted that Figure 5 represents a simplified view of system memory and caches, in order to illustrate general principles. For example, although Figure 5 maps individual memory locations to cache lines, it will be appreciated that each line in a cache generally stores data for multiple addressable locations in system memory. Thus, in Figure 5, each location (503a-503h) in the system memory (501) can actually
[0081] [0081] Associative caches can be used to determine whether a cache line is a participant in registration through the use of mode lock. Mode locking blocks or reserves certain modes in a cache for some purpose. Specifically, the modalities here use mode lock to reserve one or more modes for a thread being tracked, so that the blocked / reserved modes are used exclusively to store cache errors related to the execution of that thread. Thus, referring back to Figure 5, if "mode 0" were blocked for a tracked thread, then header lines 504a and 504c (ie index 0, mode 0 and index 1, mode 0) would be used exclusively for cache errors related to the execution of that thread, and the remaining cache lines would be used for all other cache errors. Thus, in order to determine whether a specific cache line is a participant in registration, processor 102 only needs to determine whether the cache line is part of it so that it is reserved for the thread being tracked.
[0082] [0082] Figures 6A-6D illustrate a concrete example 600 of application of method 300 of Figure 3, in the context of Figures 1, 2, 4A, 4B, and 5. Figure 6A illustrates a first table 600a showing reading activity and written by four processing units 102a (i.e., P0-P3) on a single line in shared cache 102b. Figure 6B illustrates a second table 600b that indicates a crawled cache coherence state mode (for example, as crawled using the processor's CCP) based on these readings and writes. Figure 6C illustrates a third table 600c showing what could be stored in shared cache count bits 102b (as described in connection with Figures 4A and 4B), if count bits are used. Although only one type of counting bits would typically be used (that is, unit bits per line, index bits per line or a signaling bit per line), for integrity in the description, table 600c shows each of the bits unit 603, index bits 604, and signaling bit 605. Finally, Figure 6D illustrates a fourth table 600d showing exemplary types of record data 606 that could be written to the tracking file (s) 104d in connection with each operation.
[0083] [0083] For simplicity in description, table 600a presents operations for only a single processing unit 102a at a time, but it will be appreciated that the principles here apply in situations in which there is a competing activity (for example, competing readings by two or more processing units from the same cache line). In addition, the examples described in connection with Figures 6A-6D assume that tracking is enabled for processing units P0-P2, and is disabled for processing unit P3. For example, as discussed above, this could be controlled by a bit that corresponds to each processing unit, such as a plurality of bits in a control register.
[0084] [0084] Initially, for ease of description, this example will use simplified cache line states that are derived from the cache line states (ie Modified, Owned, Exclusive, Shared, and Invalid) used in the CCPs discussed above (ie, MSI, MESI, and MOESI). In this simplification, these states map to either a "read" state (that is, the cache line was read from) or a "write" state (that is, the cache line was written to).
[0085] [0085] Notably, the modalities could record CCP data at varying levels, depending on what data is available from processor 102 and / or based on implementation choices. For example, CCP data could be recorded based on "mapped" CCP states (such as those shown in Table 1), based on actual CCP states (for example, Modified, Owned, Exclusive, Shared, and / or Invalid) made visible by processor 102, and / or even based on lower level "raw" CCP data that may not typically be made visible by processor 102.
[0086] [0086] Returning to Figures 6A-6D, table 600a includes a first column 601 showing an identifier (ID), which is used to specify a global order between operations. Table 600a also includes four additional columns 602a-602d that each correspond to one of the processing units. Although, for simplicity, this example uses a global ID, it will be appreciated that in practice each processing unit would normally order operations using its own sets of independent identifiers. These IDs could comprise an instruction count
[0087] [0087] As shown in table 600a, in the processing unit P0 of identifier ID [0] performs a reading, which causes a cache error that brings DATA data [1] to the cache line. Correspondingly, table 600b shows that the processor's CCP notes that the cache line is now "shared" by P0. Table 600c shows that if unit bits 603 are used, they indicate that the processing unit P0 consumed (that is, read) the line of the header (and that the processing units P1-P3 did not), that if the index bits 604 are used these indicate that P0 consumed the cache line and that if a signal bit 605 is used it indicates that some processing unit consumed the cache line. Given this status, in act 303 processor 102 would determine that the record is enabled for P0, and in act 304 it would determine that the cache line participates in the record (that is, using unit bits 603, index bits 604, signaling bit 605, or mode lock). Thus, in act 306, processor 102 would use CCP to record appropriate data in the tracking file (s), if necessary. Here, as the cache line is going from an invalid (empty) status to a read (table 600a) / shared (table 600b) status, the data must be recorded. As shown in register data 606 of table 600d, processor 102 could note the processing unit (P0) if necessary (that is, depending on whether data packets are being registered for data streams separated by unit of processing, or for a single data stream); the cache line address (@); counting instructions or some other count; and the data (DATA [1]) that was brought to the cache line. Although, as discussed above, the instruction count will typically be a specific processing unit value, for simplicity the table 600d refers to instruction counts in reference to the corresponding global ID (ie, IC [0] , in this case).
[0088] [0088] It is noted that the cache line address (@) and the data (for example, DATA [1]) could, in some modalities, be compressed within the tracking file (s) 104d. For example, memory addresses can be compressed by preventing the "high" bits of a memory address from being written by referencing (either expressly or implicitly) the "high" bits to a previous recorded memory address. The data can be compressed by grouping bits of a data value into a plurality of groups comprising a plurality of bits each, and associating each group with a corresponding "flag" bit. If a group is equal to a specific pattern (for example, all 0's, all 1's, etc.), the signal bit can be determined, and this group of bits does not need to be stored in the trace.
[0089] [0089] Next, table 600a shows that in ID [1] the processing unit P1 performs a reading on the cache line, reading the DATA data [1], Table 600b shows that the processor's CCP notes that the cache line is now "shared" by P0 and P1. Table 600c shows that these processing units P0 and P1 contain
[0090] [0090] Next, table 600a shows that in ID [2] the processing unit P0 performs a write on the cache line, writing DATA data [2], Table 600b shows that the processor's CCP notes that the cache line is now "modified" "by P0 and" invalid "for P1. Table 600c shows that only processing unit P0 has consumed (ie updated the value of) the cache line (unit bits 603) , that P0 consumed the cache line (index bits 604) or that some processing unit consumed the cache line (signal bit 605). Table 600d shows that, using the CCP, processor 102 determines that a recording of the operation needs to be recorded, since the cache line has been written / modified, as shown, processor 102 could notice the processing unit (P0); the cache line address (@); the instruction count (IC [2]), that the cache line went from a read state (shared) to a write state (modified), and that P0 and P1 had previous access to the previous cache line, but now only P0 has access.
[0091] [0091] Note that information on which unit (s) of pro-
[0092] [0092] If none of these are used (for example, if the CCP is not as robust, and if index bits 604, signal bit 605 or mode lock are used instead of unit bits 603), the data of register 606 may be less complete or greater. As a first example, if the CCP tracks the state of coherence at the cache line level only, and if index bits 604 are used, both can be used to identify that the cache line state is invalid (for all processing units), is modified (along with the index of the processing unit that modified it), is exclusive (along with the index of the processing unit that has it exclusive) or is shared (and all processing units have access). This can result in a simpler hardware implementation, with the disadvantage that when it is time to change the cache line from shared to modified or exclusive, all processing units must be notified, instead of only those that would be known by a CCP more granular to share the cache line. As a second example, index bits 604 could be used to identify the last processing unit that accessed the cache line. So, if the cache is inclusive (that is, so many readings are hidden behind access at L2 or L1 cache levels) then even if the processing units are reading the same cache line, an L3 cache can see relatively little - repeated requests from the same processing units. Registering each index change to a read -> read and then having the read -> write, write -> write and write -> read register the index also provides the same data as using 603 unit bits, at cost potentially slightly greater tracking. As a third example, each cache line could include a single signaling bit, but the CCP could track the state of consistency for each cache line with reference to an index of a processing unit that has the state of consistency of the cache line. Here, tracking can record more cache line movements than if unit bits were used or the CCP tracked individual processing units, but tracking can still be entirely deterministic. A brief comparison of the size of the trace file when having information about each processing unit, versus only information about processor index, appears hereinafter in connection with Figures 9A and 9B.
[0093] [0093] Returning to Figure 6A, table 600a shows that in ID [3] the processing unit P1 performs a reading of the header line, reading DATA data [2], Table 600b shows that the CCP of the pro- outgoing note that the cache line is now "shared" by P0 and P1. Table 600c shows that processing units P0 and P1 consumed the cache line (unit bits 603), that P1 consumed the cache line (index bits 604), or that some processing unit consumed the line. cache (signaling bit 605). Note that it would also be correct for index bits 604 still referenced - without P0 instead of P1. Table 600d shows that, using the CCP, processor 102 determines that a recording of the operation needs to be recorded, since the cache line has gone from a write (modified) state to a read (shared) status. As shown, processor 102 could notice the processing unit (P1); the cache line address (@); the instruction count (IC [3]); that the cache line has gone from a write (modified) state to a read (shared) state; and that P0 had previous access to the previous cache line, but now P0 and P1 have access.
[0094] [0094] Next, table 600a shows that in ID [4] the processing unit P0 again writes to the cache line, this time writing DATA data [3]. Table 600b shows that the processor's CCP notes that the cache line is again "modified" by P0 and "invalid" for P1. Table 600c shows that only the processing unit P0 consumed the cache line (unit bits 603), that P0 consumed the cache line (index bits 604) or that some processing unit consumed the cache line ( signaling bit 605). Table 600d shows that, using the CCP, processor 102 determines that a recording of the operation needs to be recorded, since the cache line has been written / modified. As shown, processor 102 could notice the processing unit (P0); the cache line address (@); counting instructions (CI [4]); that the cache line has gone from reading (shared) to writing (modified); and that P0 and P1 had previous access to the previous cache line, but now only P0 has access.
[0095] [0095] Next, table 600a shows that in ID [5] the processing unit P2 performs a reading of the cache line, reading DATA data [3]. Table 600b shows that the processor's CCP notes that the cache line is now "shared" by P0 and P2. Table 600c shows that processing units P0 and P2 consumed the cache line (unit bits 603), that P2 consumed the cache line
[0096] [0096] Next, table 600a shows that in ID [6] the processing unit P1 performs a reading of the cache line, also reading DATA data [3]. Table 600b shows that the processor's CCP notes that the cache line is now "shared" by P0, P1 and P2. Table 600c shows that processing units P0, P1 and P2 consumed the cache line (unit bits 603), that P1 consumed the cache line (index bits 604) or that some processing unit consumed the line cache (signal bit 605). Note that it would also be correct for index bits 604 to still refer to P0 or P2 instead of P1. Table 600d shows that, using the CCP, processor 102 determines that a recording of the operation should be recorded. As shown, processor 102 could notice the processing unit (P1); the cache line address (@); the instruction count (IC [6]); that the cache line has gone from a read (shared) state to a read (shared) state; and that P0 and P2 had previous access to the previous cache line, but now P0, P1 and P2 have access.
[0097] [0097] Next, table 600a shows that in ID [7] the processing unit P3 performs a reading of the cache line, also reading DATA data [3]. Table 600b shows that the CPU of the processor notes that the cache line is now "shared" by P0, P1, P2 and P3. Table 600c shows that none of the unit bits 603, index bits 604 or signal bit 605 have been updated. This is because the record is disabled for P3, and, for tracking purposes, it did not "consume" the cache line by performing the reading. Table 600d shows that no data has been recorded. This is because in act 303 processor 102 would determine that the register is not enabled for P3.
[0098] [0098] Next, table 600a shows that in ID [8] the processing unit P3 performs a write on the cache line, writing DATA data [4]. Table 600b shows that the processor's CCP notes that the cache line is now "invalid" for P0, P1, and P2, and "modified" by P3. Table 600c shows that unit bits 603, index bits 604 and signaling bit 605 all reflect the cache line as not being consumed by any processing unit. This is because the record is disabled for P3, so, for tracking purposes, it did not "consume" the line of the header when it performed the writing; moreover, writing invalidated the value in the cache line for the other processing units. Table 600d shows that no data has been recorded. Again, this is because in act 303 processor 102 would determine that the register is not enabled for P3.
[0099] [0099] Next, table 600a shows that in ID [9] the processing unit P0 performs a write on the cache line, writing DATA data [5]. Table 600b shows that the processor's CCP notes that the cache line is now "modified" by P0 and "invalid" for P3. Table 600c shows that no processing units have consumed the cache line. This is because no log entry was made in connection with this operation - as reflected in table 600d. No record entry needs to be made because the written data would be reproduced through normal execution of the P0 chaining instructions. However, any entry could optionally be written to the trace in this circumstance (that is, a write to a cache line that is not registered by a record-enabled processing unit) to provide extra data for a tracking consumer. In this circumstance, a log entry could be treated as reading the cache line value, plus writing DATA [5].
[00100] [00100] Next, table 600a shows that in ID [10) the processing unit P2 performs a reading of the cache line, reading DATA data [5]. Table 600b shows that the processor's CCP notes that the cache line is now "shared" by P0 and P2. Table 600c shows that processing unit P2 consumed the cache line (unit bits 603), that P2 consumed the cache line (index bits 604) or that some processing unit consumed the cache line (signal bit 605). Table 600d shows that, using CCP, processor 102 determines that a recording of the operation needs to be registered, since the cache line value has not been registered previously (that is, it was not registered in ID [9] ). As shown, processor 102 can notice the process unit (P2); the cache line address (@); the instruction count (IC [10]); that the data (DATA [5]) that was brought into the cache line; and that P2 has access to the cache line. It may also be possible to register that P0 also has access to the cache line, depending on what information the specific CCP and count bits provide.
[00101] [00101] Next, table 600a shows that in ID [11] the processing unit P1 performs a reading of the cache line, also reading DATA data [5]. Table 600b shows that the processor's CCP notes that the cache line is now "shared" by P0, P1 and P2. Table 600c shows that processing units P1 and P2 consumed the cache line (unit bits 603), that P1 consumed the cache line (index bits 604) or that some processing unit consumed the cache line (signal bit 605). Note that it would also be correct that index bits 604 still reference P2 instead of P1. Table 600d shows that, using the CCP, processor 102 determines that a recording of the operation should be recorded. As shown, processor 102 could notice the processing unit (P1); the cache line address (@); counting instructions (IC [11); that the cache line has gone from a read (shared) state to a read (shared) state; and that P2 had previous access to the previous cache line, but now P1 and P2 have access. Note that the value (DATA [5]) does not need to be registered, since it was registered by P2 in ID [10].
[00102] [00102] Next, table 600a shows that in ID [12] the processing unit P0 performs a reading of the cache line, also reading DATA data [5]. Table 600b shows that the processor's CCP still notes that the cache line is now "shared" by P0, P1 and P2. Table 600c shows that processing units P0, P1 and P2 consumed the cache line (unit bits 603), that P0 consumed the cache line (index bits 604), or that some processing unit consumed the cache line (signal bit 605). Note that it would also be correct for index bits 604 to still refer to P1 or P2 instead of P0. Table 600d shows that, using the CCP, processor 102 could determine that a recording of the operation should be recorded. In this case, processor 102 may notice the processing unit (P0); the cache line address (@); instruction count (IC [12]); that the cache line has gone from a read (shared) state to a read (shared) state; and that P1 and P2 had previous access to the previous cache line, but now P0, P1 and P2 have access. No value (DATA [5]) is registered, as it is available in P2.
[00103] [00103] Alternatively, it might be possible for processor 102 to reference P0 only in ID [12], since P0 already has the value of the cache line (that is, because it wrote this value in ID [9]). It could even be possible to abstain from any record in ID [12], since heuristics could be used in the reproduction to recover the value (that is, DATA [5]) without information referencing P0 being in the trace. However, these techniques can be computationally expensive and reduce the system's ability to detect when reproduction has "failed". An example of a heuristic is to recognize that memory access through processing units is generally strongly ordered (based on CCP data), so that reproduction could use the last value through these units to locate a given memory.
[00104] [00104] Next, table 600a shows that in ID [13] the cable line is removed. As a result, table 600b shows that the CCP entries are empty, table 600c shows that the count bits do not reflect any processing units as having consumed the cache line, and table 600d shows that none data is registered.
[00105] [00105] Note that although, for completeness, record data 606 lists all final access states (ie, which processing units now have access to the cache line), this information is potentially implicit and the trace file size can be reduced by omitting them. For example, in a transition from writing -> reading, the list of processing units that have access after reading is always the processing unit.
[00106] [00106] In general, in order to generate a fully deterministic tracking file, a CCP would dictate that all transitions (ie, writing -> reading, writing -> writing, reading -> writing, and reading -> reading) through processing units (for example, from P0 to P1) are recorded. However, transitions with the same processing unit (for example, from P0 to P0) do not need to be recorded. These do not need to be registered because they will be reproduced through the normal execution of the thread that you executed in that processing unit.
[00107] [00107] It will be appreciated that, using data such as those which are recorded in the example above, and with additional knowledge of the CCP used by processor 102 in which the recording was made, a total ordering of the operations that occurred in each chain. can be reconstructed, and at least a partial ordering of operations between the different processing units can be reconstructed. Thus, either through an indexing process and / or through a reproduction of the tracking file, each of the above operations can be reconstructed - even if they have not all been expressly recorded in the tracking file (s) 104d.
[00108] [00108] In some embodiments, tracker 104a may record additional data packets in tracking file (s) 104d, in order to improve record of ordering operations through processing units. For example, tracker 104a could record, with some events, ordering information such as monotonically incrementing numbers (MINs) (or some other counter / timer) in order to provide a total ordering of events that have a MIN (or other counter / timer) through the threads. These MINs could be used to identify packages of data that correspond to events that are defined to be "sortable" through chaining. These events could be defined based on a "tracking memory model" that defines how threads can interact through shared memory and their shared use of data in memory. As another example, tracker 104a could (periodically or randomly) record a processor state hash based on a defined deterministic algorithm and a defined record set (e.g., program counter, stack, general purpose records, etc.). ). As yet another example, tracker 104a could (periodically or randomly) forcibly record cache line data. As yet another example, tracker 104a could include in the tracking "transition" packets that register a hash of all or a portion (for example, a few bits) of data that these implicitly carry. Thus, when these implicit data are reconstructed in reproduction, appropriate portions of the implicit data can be hashed and matched with these transition packages to help identify their ordering. This can be useful, for example, if the CCP cannot track processor indices associated with cache lines if the cache lines are in the shared state.
[00109] [00109] When tracker 104a writes additional data packets to tracking file (s) 104d, in order to improve order
[00110] [00110] Figure 7A illustrates an example in which some read -> read transitions could be omitted from scanning depending on how processors are scanned. Similar to Figure 6A, Figure 7A includes a table 700a with a global ID 701, and three columns (702a-702c) that correspond to three processing units (P0-P2). Omitting some reading transitions -> reading is based on two observations. First, the writings need to be ordered; however, all readings between two consecutive writes (for example, readings in ID [3] -ID [7]) will read the same value so that the order between these readings is irrelevant (and thus a trace that omits these reading transitions -> reading can be deterministic). Second, having a read "cross" a write while playing (that is, a read and write to the same cache line being played in the wrong order) means that the correct data is not being used for playing; however, having data (for example, MINs etc.) to avoid making this error will help to identify valid sortings.
[00111] [00111] In the example shown in table 700a, the processing unit P2 only performs readings for shared data, and these shared readings only "steal" from other readings (for example, assuming that ID [9] left the cache line shared). If no log entry is made for any of the read -> read transitions (ie, ID [4] -ID [7] and ID [10]), there will be no information in the trace to properly place the P2 readings. Based on the writings, it can be concluded that P2 never read the DATA value [1] (since the writing in ID [2] did not steal from P2), and the lack of registry entries for reading transitions -> reading P2 (ie, ID [4], ID [7] and ID [10]), all that can be concluded for P2 is that there was at least one reading by P2 between ID [2] and ID [8]. If, however, there were log entries for ID [4] and ID [10], the remaining readings that may not need to be recorded (ie, ID [5] -ID [7], as shown in Figure 7B) can be located. Each of these readings belongs to the same reading section as the last recorded reading (that is, in ID [4]). These readings can therefore be located on the basis of what the writings steal (and if no operation steals from a reading then there is no write after that until the next registered package).
[00112] [00112] In view of table 700a, Figure 7B illustrates a table 700b showing record data - omitting the read transitions -> reading highlighted in Figure 7A, which could be written if "unit bits" were used . Figure 7C illustrates a table 700c showing record data that could be written if "index bits" were used and the indexes are updated in the readings.
[00113] [00113] As briefly mentioned above, some caches include both inclusive and exclusive layers (that is, a not fully inclusive cache). The registration techniques described here are applicable to these caches, as well as purely inclusive or exclusive caches. As an example, Figure 8A illustrates a computing environment 800a that includes two processors 801a / 801b (for example, two processors in corresponding sockets). Each 801 processor includes four 802a / 802b processing units (for example, physical or logical processing units). Each 801 processor also includes a three-layer cache, including an L1 layer 803a / 803b, an L2 layer 804a / 804b and an L3 layer 805a / 805b. As shown, each cache includes four L1 803 caches - each corresponding to one of the 802 processing units. In addition, each cache includes two L2 804 caches - each corresponding to two of the 802 processing units. In addition, each cache includes an L3 cache 805 for all 802 processing units on processor 801. The processing units and some of the caches are individually identified - for example, processor 802a processing units on processor 801a are identified as A0-A3, L2 caches are identified as A4 and A5, and the L3 cache is identified as A6. Similar identifiers are used for corresponding components on the 801b processor. The asterisks (*) associated with the processing units A0, A1, A2, B0 and B1 indicate that the register is enabled for these processing units.
[00114] [00114] In the 800a computing environment, caches could exhibit a mix of inclusive and exclusive behaviors. For example, it may be inefficient for the 801a processor's A6 L3 cache to store a cache line when only processing unit A0 is using it. Instead, in this case the cache line could be stored in A0's L1 cache and A4's L2 cache, but not in A1's L1 cache or A5's L2 cache or lower caches.
[00115] [00115] Figure 8B illustrates includes a table 800b that shows exemplary read and write operations performed by some of the 802 processing units. The format of table 800b is similar to the format of table 600a. In view of the computing environment 800a and table 800b, three different record examples are now provided, each using different cache behaviors. These examples are described in the context of the following principles for registration using a CCP: (1) Generally, registration data when an address (cache line) goes from "unregistered" to "registered" (ie, with based on the determination that the cache line participates in the registration in act 304); (2) Generally, it avoids registering when a cache line goes from "registered" to "unregistered" or "removed" (although the registration is still valid if these data are registered). However, this is valid for deregistrations. Doing so increases the tracking size, but provides additional information that can help to identify the ordering between tracking data streams, can help to identify when the reproduction of a tracking "has not been possible", can provide analysis additional tracking.
[00116] [00116] In a first example, shown in Figure 8C, the CCP tracks the cache line status per processing unit (that is, each core has its own read and write status). In this example, the cache behaves similarly to an inclusive cache, except that there may be data that moves through the cache or through a socket that is not available at registration time. For the sake of brevity, in these examples, 802 processing units are referred to as "cores" and processors 801a and 801b are referred to as processors A and B or sockets A and B. In addition, a simplified registration notation from "ID: Core: From: Transition (ie, from -> to)" is used to represent types of data that could be recorded. This notation is explained in more detail online. For the first example, the record could include:
[00117] [00117] In ID [0], "0: A0: R [DATA] -> [1]" - that is, in ID [0], registering that the A0 nucleus reads DATA [1], by principle 1 above.
[00118] [00118] In ID [1], "1: B0: R [DATA] -> [1]" - that is, in ID [1], registering that the B0 nucleus reads DATA [1], also by the principle 1 above. If the cache in processor B is not aware that A0 already has the data recorded, then processor B registers it itself. Alternatively, if the cache on processor B is aware that A0 has DATA [1] registered, then the registry entry could include "1: B0: R [A0] -> R".
[00119] [00119] In ID [2], "2: A1: R [A0] -> R" - that is, in ID [2], recording that nucleus A1 made a transition from reading -> reading, and that A0 had access. As the cache line state is shared with processor B, the entry could be "2: A1: R: [A0, B0] -> R" - that is, in ID [2], recording that A1 did a transition from reading -> reading, and that A0 and B0 had access. Since crossing sockets is typically more expensive than registering within a socket, the first registration entry may be preferred for read -> read transitions. When registering to / from scripts that cross sockets, however, the register also crosses sockets.
[00120] [00120] In ID [3], some modalities do not register anything. Alternatively, as nucleus A2 has not registered anything yet, and the first thing it does is writing, this could be registered as reading -> writing. Anyway, as a write occurred, the other cores have their cache line state invalidated. The cost (for example, in terms of tracking data) of recording the reading -> writing in ID [3] would typically be less than recording the actual data in ID [4], so it can be beneficial to record here. In this case, the registry entry could include "3: A2: R [A0, B1, B0] -> W" - that is, nucleus A2 made a read transition -> writing and nuclei A0, B1, and B0 had access.
[00121] [00121] What happens in ID [4] depends on what was registered in ID [3]. If nothing has been recorded in ID [3], then the data is recorded (ie, "4: A2: R [DATA] -> [2]"). On the other hand, if a package has been registered in ID [3], then there is nothing to register.
[00122] [00122] In ID [5] there is a reading that crosses nuclei. However, if the A2 core still has the cache line as modified (or equivalent) then the cache line serves the request (it cannot be served from memory). In this case, socket B will know that this came from socket A and re-registering the data can be avoided; this could be recorded as "5: B0: W [A2] -> R". If the cache obtained the main memory data (this could be the case if socket A were able to update the main memory and share its cache coherence state for the line), then the entry could be "5: B0 : R [DATA] -> 2 ".
[00123] [00123] In ID [6] the operation is a normal reading. Like the ID reading [2], socket B could know about socket A data or not. If so, the registry entry could include "6: B1: R [B0, A2] -> R"; otherwise it could include "6: B1: R [B0] -> R".
[00124] [00124] In ID [7], if the cache line for B0 has not been removed there is nothing to record. If it has been removed, processor B would either register the data as coming from another core, or register the cache line data. This removal of a core, but not others in the socket, usually does not happen in fully inclusive caches. In a fully inclusive cache, if any core in the socket has the cache line in its L1 cache, then L3 has the cache line, so the cache line cannot be removed from one core, but not from another.
[00125] [00125] In ID [8], as the A0 core has not registered anything since then and the first operation to register is a write, this is similar to the operation in ID [3]. Processor A can register this as a read -> write; alternatively, but perhaps less preferably, processor A could not register anything. If the package is registered, its contents would vary depending on whether socket A can see socket B. If it cannot, the package could include "8: A0: R [A2] -> W", but if it can, the package could include "8: A0: R [B0, B1, A2] -> W".
[00126] [00126] In ID [9] there is nothing to record if a packet was registered in ID [8] (since it is written to a cache already registered), despite the cache line state for the other cores are typically invalidated if not already.
[00127] [00127] In ID [10], the registration depends on what was registered in ID [8]. If no data was recorded in ID [8] then it needs to be done here, so the package could include "10: A1: R [DATA] -> [4]". If a package was registered in ID [8], this is a normal write-> read package (for example, "10: A1: W [A0] -> R").
[00128] [00128] In ID [11] the transition from reading -> reading is recorded. If a packet was registered at ID [8] then A0 is in the core source list (for example, "11: A2: R [A0, A1] -> R); otherwise, A0 is not in the list (for example, "11: A2: R [A1] -> R").
[00129] [00129] In ID [12] if socket B can see socket A, this is a read -> read packet (for example, "12: B0: R [A0, A1, A2] -> R"). If this cannot be done then it is a total data record (for example, "12: B0: R [DATA] -> [4]").
[00130] [00130] In ID [13] the data comes from B0, plus socket A if it is visible (for example, "13: B1: R [A0, A1, A2, B0] -> R"). The list can omit nucleus A0 if the write was not registered in ID [8].
[00131] [00131] In ID [14] nothing needs to be registered if a package has been registered in ID [8] has already been registered. Otherwise, A0 will get data A1 & A2, most potentially socket B if it can be seen. As such, the package could include "14: A0: R [A1, A2, B0, B1] -> R".
[00132] [00132] Note that despite this example having registered the
[00133] [00133] Also at any time in time the cache line can be removed, which would mean that the data needs to be collected from another core or re-registered. For example, if before ID [11], A0 had its cache line removed, then A2 would get the value of A1. If both A1 and A0 were removed, then processor A may need to register the cache line value in the trace for A2.
[00134] [00134] Finally, some processors may know that the data comes from another socket, but they do not know which core in that socket. In these cases, the processor could register precedence (source) as a socket ID, register the data itself, or register the socket ID and a hash of the data (that is, to help order cross-socket accesses, but you do not need to record the entire data for tracking).
[00135] [00135] In a second example, shown in Figure 8D, the CCP uses indexes instead of tracking cache consistency of each core separately. In this environment, the index could be tracked in cross-socket or intra-socket. Due to the performance of cross-socket versus intra-socket communications, the latter case (intra-socket) may be more practical. When the index is crawled intrasocket, the crawl may need to register something when the data moves across a cross socket. This could include registering the index of the other socket (but this may not necessarily be unique enough for a deterministic trace), registering a hash of one or more portions of the cache line value, or registering a packet in the socket trace to indicate that the data has been sent.
[00136] [00136] When tracking core indexes when using a non-fully inclusive cache, a complication arises when an L1 cache may have data that is not in the L3 cache. So, for example, assume the following sequence of events: (i) A0 gets a line (so the index bits refer to A0) in its L1 cache; (ii) A1 gets the line (so the index bits refer to Al) in its L1 cache; (iii) the L3 cache removes the line; (iv) A1 removes the L1 cache line; and (v) A2 obtains the cache line from A0 in its L1 cache. Here, although A2 obtains the cache line from A0, the index does not refer to A0. This complicates the record mappings in the trace. Some solutions could include adding extra information (as described above), such as hashing one or more portions of the cache line data, periodically adding redundant information such as a hash of general purpose records, etc. Logging removals could also help, but this can significantly increase the size of the crawl file and complicate registration (for example, logging L1 cache removals that are not in L2 or L3 caches, but not logging L1 cache removals that are in L2 or L3 caches).
[00137] [00137] In some modalities, when data moves from an L3 cache to a child L2 or L1 cache, a record entry is only made if the index changes. For example, suppose A0 has the line in its L1 cache (so the index bits refer to A0), then A1 gets the line in its L1 cache (index in A1), so both remove the cache line but the Common L2 (or L3) still has it. If the L2 header serves A1, then there is nothing to record. If the L2 cache serves A0 then no record entry needs to be made if it is known that A0 already has the data; but if it is not known (or cannot be determined) whether A0 already has the data, then the processor may need
[00138] [00138] Table 800d presents a record of the operations of table 800b, assuming that the sockets register independently, that the tracking is performed by index, that there are no extra hidden removals, and that all the writings that impact the CCP and what happens when the record is turned on are registered (for example, a script needs to be registered if there are consecutive writings by the same nucleus and there is no access between writings by another nucleus or another external entity). For the second example, the record could include:
[00139] [00139] For ID [0], "0: A0: R [DATA] -> [1]".
[00140] [00140] For ID [1], "1: B0: R [DATA] -> [1]" - that is, remember that each socket is registered separately.
[00141] [00141] For ID [2], "2: A1: R [A0] -> R".
[00142] [00142] For ID [3], "3: A2: R [A1] -> W".
[00143] [00143] For ID [4], nothing.
[00144] [00144] For ID [5], "5: B0: R [DATA] -> [2]". This is because writing in ID [3] invalidated the line through all the sockets, and the sockets are being tracked independently (as indicated above).
[00145] [00145] For ID [6], "6: B1: R [B0] -> R".
[00146] [00146] For ID [7], if the cache line for B0 has not been removed there is nothing to register.
[00147] [00147] For ID [8]: "8: A0: R [A2] -> W", as the registration bit is set (and although this core has not registered the data before). This entry demonstrates how, with indexes, there is only the knowledge of the last owner in the socket.
[00148] [00148] For ID [9], there is nothing to register.
[00149] [00149] For ID [10], "10: A1: W [A0] -> R".
[00150] [00150] For ID [11], "11: A2: R [A1] -> R".
[00151] [00151] For ID [12], "12: B0: R [DATA] -> [4]". This is because the cache line was invalidated through all sockets in ID [8],
[00152] [00152] For ID [13], "13: B1: R [B0] -> R".
[00153] [00153] For ID [14], "14: A0: R [A2] -> R". Note that in ID [11] the index has been updated to be A2. Note also that it would not be known that this kernel already had the data (ie, ID [9]) since the index does not load this information, whereas before the state per processor (unit bits) it was able to load the information .
[00154] [00154] In a third example, caches in the 800a environment are unable to keep track of which core has the last shared (read) access to a cache line. Thus, in this example, the index of the last reader cannot be tracked, there are no longer bits to do it. Here, the CCP can use an index value (which does not map to any nucleus) to signal a shared line, another index value to signal an invalid line, and the processor index to a "modified" state ( for example, using an MSI protocol). In this third example, registration could include registering the cache index in a package, instead of the core index. Moves from parent to child need not be recorded, but could be recorded as extra data. If parent to child movements are not recorded, then the parent to child cache hierarchy may need to be provided for the record to be interpreted.
[00155] [00155] As mentioned above, in some environments each cache line in a cache could include a single signaling bit, but the processor CCP could track coherence state for each cache line, in reference to an index for a unit of processing that has the cache line coherence state. As mentioned, this produces fully deterministic traces, but can result in greater traces than in cases that have information per processing unit (for example, a CCP that tracks by processing unit, in combination with a signaling bit per line cache). Figures 9A and 9B illustrate how the record may differ in these two situations (i.e., CCP unit information but cache line signaling bit versus CCP index plus cache line signaling bit). Figure 9A illustrates a table 900a showing readings and writings by two processing units (P0 and P1), and Figure 9B illustrates a table 900b that compares when log entries could be made in these two environments. In these examples, assume that the signaling bit gets released, and that the unit / index bits indicate that no processing unit has access to the cache line.
[00156] [00156] Initially, if the CCP tracks unit information and the cache line uses a signaling bit, the registration could proceed as follows. As shown in table 900b, in ID [0] nothing needs to be registered, since it is written on a cache line that was not registered (alternatively, the value before writing could be registered and the signaling bit could be skipped). At this point, the CCP may notice that neither P0 nor P1 accesses the cache line. In ID [1], the data from the cache line could be recorded for P1. The signaling bit could be turned on and the CCP could notice that P1 has access to the cache line. In ID [2], a read -> read packet could be registered, with P0 taking the cache line from P1 (this is registered since the signaling bit was on, and the CCP is used to determine that P0 had no access). The signaling bit was already on, and the CCP notes that P0 now also has access to the status of the cache line. In ID [3], nothing needs to be registered (the cache line is already in the registry for this kernel). This is determined because the signaling bit is on, and the CCP indicates that P1 has already had access to the cache line. In ID [4] a read -> write package could be registered for P0. This is because the signaling bit is on, and P0 already has access to the cache line. As this was a script, the CCP could
[00157] [00157] Now, if the CCP tracks index information only and the cache line uses the signaling bit, the record could proceed as follows. As shown in table 900b, in ID [0] nothing needs to be registered since the signaling bit is off and this is a write. As before, this can alternatively be recorded as a reading plus a writing, if the memory is readable by P0. In ID [1] the cache line data could be registered for P1. The signaling bit could be turned on, and the CCP and update the index to the point for P1. In ID [2] a reading -> reading package could be registered for P0. This is because the signaling bit is already set and the index is at P1. The CCP can update the index to P0. In ID [3] a reading -> reading package could be registered for P1. Note that this case is now indistinguishable from ID [2] since in both cases the index on the other processor, the signaling bit is on and the cache line is in a shared state. The CCP can update the index to P1. In ID [4], a read -> write package could be registered for P0. The signaling bit is on, so that the packet can register by reference. This updates the CCP index to P0. In ID [5] a writing -> reading package could be registered for P1. The signaling bit is on, the packet registers by reference. The cache line moves to a shared state, so that the CCP updates the index to P1. As shown in table 900b,
[00158] [00158] Some modalities here have indicated that it may be beneficial in terms of tracking file size to record data packets that reference data owned by another processing unit (when possible), rather than recording the data line. cache later (for example, ID [4] in each of the preceding examples). Other benefits can also flow from recording by reference. For example, in reproduction, when there are a series of log entries that are by reference, it can be inferred that no external intervention has occurred in the cache line data. This is because when the data in a total cache line is re-registered it means that either the cache line has been removed or invalidated. Thus, including registry entries by reference, even in a situation where a registry entry may not be strictly necessary, it can provide implicit information about the absence of external interventions that may be useful information for reproduction or for debugging.
[00159] [00159] In some implementations, the addresses that are recorded in the tracking entries (for example, the "@" entries above) comprise physical memory addresses. In these implementations, processor 102 can write one or more TLB 102f entries to trace file (s) 104d. This can be as part of the tracking data streams for the different processing units or as part of a further additional tracking data stream. This will allow you to reproduce the software to map these physical addresses to virtual addresses later.
[00160] [00160] In addition, as physical addresses can sometimes be considered "secret" information (for example, when recording at the user mode level), some modalities record some representation of the actual physical addresses, instead of the addresses themselves physicists. This representation could be any representation that only maps its identifiers to physical addresses, without revealing the physical address. An example could be a hash of each physical address. When these representations are used, and TLB 102f entries are written to the tracking file (s) 104d, processor 102 records a mapping between these representations and virtual addresses, instead of physical addresses for virtual addresses. .
[00161] [00161] As mentioned, processor 102 may include one or more temporary stores 102e. These temporary stores can be used as a temporary storage location for trace file entries, before actually writing these entries to the tracking file (s) 104d. Thus, when act 305 causes data to be recorded for tracking, act 305 could comprise writing the data to the temporary storage (s) 102e. In some modes, processor 102 employs deferred logging techniques in order to reduce the impact of writing tracking data to processor 102 and the memory bus. In these modalities, processor 102 can store tracking data in temporary storage (s) 102e and defer to write tracking file (s) 102f until there is bandwidth available on the bar. memory storage or temporary storage (s) 102e is / are full.
[00162] [00162] As was also mentioned, some modalities can register cache removals. Figures 10A and 10B illustrate some modalities of how cache removal can be recorded in an efficient manner (that is, in terms of crawl file size) by leveraging associative cache properties. Initially, Figure 10A illustrates an example of 1000 of different parts of a memory address, and their relationship with associative caches. As shown, memory addresses include a first plurality of bits 1001 which are the low bits of the address, and which are typically zero. The first plurality of bits 1001 is zero because the memory addresses are typically aligned with a size of the memory address (e.g., 32 bits, 64 bits, etc.). Thus, the number of the first plurality of bits 1001 is dependent on the size of the memory address. For example, if a memory address is 32 bits (i.e., 2 ^ 5 bits), then the first plurality of bits 1001 comprises five bits (so that the memory addresses are multiples of 32), if a memory address is 64 bits (i.e., 2 ^ 6), then the first plurality of bits 1001 comprises six bits (so that the memory addresses are multiples of 64), etc. The memory addresses also include a second plurality of bits 1002 that can be used by a processor 102 to determine a specific group of addresses in an associative cache in which the memory address data is to be stored. In example 1000 of Figure 10A, for example, the second plurality of bits 1002 comprises three bits, which would correspond to an associative cache that has eight groups of addresses. The number of the second plurality of bits 1002 is therefore dependent on the specific geometry of the associative cache. The memory addresses also include a third plurality of bits 1003 which comprises the remaining high bits of the memory address.
[00163] [00163] In the context of Figure 10A, Figure 10B illustrates an example 1004 of recording cache errors and cache removals in an associative cache. Initially, example 1004 shows three memory addresses 1005 (that is, address 1024), 1006 (that is, address @
[00164] [00164] Now, suppose there is a first cache error at address 1005 (that is, @ 1024). Here, since its second plurality of bits 1002 is '000', processor 102 can determine that it must store the data corresponding to address 1005 in group 0 of cache 1010. The specific mode in group 0 is typically chosen by the specific logic of processor. For the purposes of example 1004, however, assume that the data is stored in mode 0 (as shown by arrow 1011a). In connection with this cache error, the recorded data recorded by tracker 104a could include the memory address (ie, @ 1024) and the mode (ie, mode 0) in which the data was stored. Note that any number of compression techniques could be used to reduce the number of bits needed to store the memory address in the trace. The group (i.e., group 0) does not need to be registered because it can be obtained from the second plurality of bits 1002 of the memory address.
[00165] [00165] Next, suppose there is a second cache error at address 1006 (that is, @ 2112). This time, the second plurality of bits 1002 is '010' so that processor 102 can determine that it must store the data corresponding to address 1006 in group 2 of cache 1010. Again, the specific mode in group 2 is typically chosen by processor-specific logic. For the purposes of example 1004, however, assume that the data is stored in mode 0 (as shown by arrow 1011b). In connection with this cache error, log data recorded by tracker 104a could include the memory address (ie, @ 2112) and the mode (ie, mode 0) in which the data was stored. Again, the group (ie, group 2) does not need to be registered because it can be obtained from the second plurality of bits 1002 of the memory address.
[00166] [00166] Now suppose there is a third cache error at address 1007 (that is, @ 2048). The second plurality of bits 1002 is again '000', so that processor 102 can determine that it must store the data corresponding to address 1007 in group 0 of cache 1010. The specific mode is again chosen by specific processor logic, but suppose the processor chooses mode 0 (as shown by arrow 1011c). In connection with this cache error, the log data recorded by tracker 104a could include the memory address (i.e., @ 2048) and the mode (i.e., mode 0) in which the data was stored. Again, the group (ie, group 0) does not need to be registered because it can be obtained from the second plurality of bits 1002 of the memory address.
[00167] [00167] As this cache line (0,0) currently corresponds to address 1005, this third cache error at address 1007 causes address 1005 to be removed from cache 1010. However, the modalities may refrain from recording any tracking data that documents this removal. This is because the removal can be inferred from data already in the trace - that is, the first cache error at address 1005 in mode 0, together with the second cache error at address 1007 in mode 0. Despite the group (this ie, group 0) may not be expressly registered in the tracking, this can be inferred from these addresses. As such, reproduction of this tracking data may reproduce the removal.
[00168] [00168] Some removals result from events other than a cache error. For example, a CCP can cause a removal to occur in order to maintain consistency between different caches. Suppose, for example, that address 1006 is removed from the cache line (2.0) of cache 1010 due to a CCP event. Here, the removal can be expressly recorded by recording the group (ie '010') and the mode (ie '00') of the removal. Notably, the address that was removed does not need to be registered, since it was already captured when registering the second cache error that brought address 1006 into the cache line (2.0). Consequently, in this example, the removal can be fully captured in the 104d tracking file (s) with a mere five bits of log data (before any form of compression).
[00169] [00169] Some modalities are also able to safely track activity of a processing unit, even when a thread that executes in this processing unit interacts with a secure enclave. As will be appreciated by those skilled in the art, enclaves are hardware-based security features that can protect confidential information (for example, cryptographic keys, credentials, biometric data, etc.) from potentially even software. lowest level that runs on a 102 processor. Thus, in addition to protecting confidential information from user mode processes, enclaves can even protect confidential core and / or hybrid information.
[00170] [00170] The first enclave-aware tracking modalities track an execution process, while refraining from tracking an enclave with which the process interacts, while still allowing the tracked process to be fully reproduced. In these modalities, the memory readings by the process in progress for your address space are tracked / recorded using one or mechanisms already described here. When there is an exchange of context for the enclave, however, the modalities can track any memory locations that were previously read by the tracked process, and that are written by the enclave during its execution. When the scanned process again runs after switching to the enclave, these memory location (s) are treated as if they were not recorded by the scanned process. In this way, if the scanned process again reads from this (these) memory location (s) (potentially reading data that was placed in this (s) location (s) by the enclave), these readings are recorded in the scan. This effectively means that any side effects of enclave execution that are visible to the tracked process are captured in the trace, without having to track the enclave's execution. In this mode, the tracked process can later be reproduced using these side effects, without actually needing (or even being able to) reproduce the enclave execution. There are several mechanisms (previously described) that can be used to track memory locations (s) that were previously read by the tracked process and that are written by the enclave during its execution, such as count bits (for example, signaling bits, unit bits, index bits), path lock, use of CCP data, etc.
[00171] [00171] The second enclave-aware tracking modalities track the execution process (for example, based on accesses, such as readings, to your own address space), while also tracking the enclave (for example, based on access to its own address space and / or access to the address space of the tracked process). These modalities could be implemented when there is a necessary level of trust between the nucleus / hypervisor and the enclave. In these modalities, the tracking data related to the execution of the enclave could be registered in a separate and / or encrypted stream of tracking data, so that any entity that performs a reproduction is unable to reproduce the enclave without access to the data. flow of tracking data separate from the enclave and / or cryptographic key (s) that can be used to decipher the tracking data related to the execution of the enclave.
[00172] [00172] The third enclave-aware tracking modalities combine the first and second modalities. Thus, these third modalities can record a trace of a running process that includes the side effects of using an enclave process (ie, the first modality), along with a trace of the enclave itself (ie, the second modality). This allows the execution of the tracked process to be reproduced by a user who does not have the required privilege level and / or cryptographic key (s), while allowing a user who has the necessary privilege level and / or key cryptographic (s) also reproduce the execution of the enclave itself.
[00173] [00173] Each of these enclave tracking modalities is applicable in addition to enclaves, and to any situation in which a tracked entity interacts with another entity whose execution needs to be protected during the tracking (now referred to as a protected entity). For example, any of these modalities could be used when tracking a user mode process that interacts with a core mode process - here, the core mode process could be treated similarly to an enclave. In another example, any of these modalities could be used when tracking a core process that interacts with a supervisor - here, the hypervisor could be treated similarly to an enclave.
[00174] [00174] There may be environments in which it is not practical (for example, due to performance or security considerations), not possible (for example, due to lack of hardware support), or not desirable to track which location (s) ) of memory that were previously read by a tracked process and written by a protected entity during its execution. This may prevent the use of the enclave tracking modalities described above. However, there are also techniques for tracking in these situations.
[00175] [00175] A first technique is to treat the processor cache as having been invalidated after changing the context of the protected entity. Treating the processor cache as having been invalidated causes readings by the tracked entity after the protected entity returns to cause cache errors - which can be logged. These cache errors will include any values that have been modified in the address space of the entity tracked by the protected entity, and which were subsequently read by the entity crawled.
[00176] [00176] A second technique is to register cache errors related to readings by a protected entity of the address space of the tracked entity, as well as writes performed by the protected entity in the address space of the tracked entity. This allows a reproduction of the trace to reproduce the writings of the protected entity without having access to the instructions of the protected entity that produced them. This also provides replay access to the data (in the tracked entity's address space) that the protected entity has read and which the tracked entity has subsequently accessed. Hybrid proposals are possible (if sufficient accounting information, such as CCP data, is available) that could record the writings of the protected entity (in the address space of the tracked entity), but not their readings - if these readings were recorded later due to the treatment of the cache as invalid.
[00177] [00177] The present invention can be incorporated in other specific ways without departing from its spirit or essential characteristics. The described modalities should be considered in all aspects only as illustrative and not restrictive. The scope of
The invention is therefore indicated by the appended claims rather than by the description above.
All changes that come within the meaning and equivalence range of the claims must be covered within their scope.
权利要求:
Claims (15)
[1]
1. Computing device, characterized by the fact that it comprises: a plurality of processing units; a cache memory comprising a plurality of cache lines that are used to cache data from one or more backup stores and that are shared by the plurality of processing units, in which consistency between data in the plurality of cache lines and the one or more support stores is managed according to a cache coherence protocol (CCP); and stored control logic that configures the computing device to perform at least the following: determine that at least the following conditions have been met: (i) an operation caused an interaction between a specific cache line of the plurality of cache lines and one or more support stores; (ii) registration is enabled for a processing unit specific to the plurality of processing units that caused the operation; (iii) the specific cache line is a participant in the registration; and (iv) the CCP indicates that there is data to be recorded in a trace based on the operation; and based on at least the determination that the conditions have been met, make the data recorded in the trace, the data usable to reproduce the operation.
[2]
2. Computing device according to claim 1, characterized in that the stored control logic also configures the computing device to update one or more count bits associated with the specific cache line to indicate whether the line specific cache remains a participant in registration after the operation.
[3]
3. Computing device according to claim 2, characterized in that the one or more count bits associated with the specific cache line comprise one of (i) a single bit, (ii) a plurality of bits that each corresponds to one of the plurality of processing units, or (iii) a plurality of bits that stores a processor index value.
[4]
4. Computing device according to claim 2, characterized in that the one or more count bits associated with the specific cache line are stored in one or more reserved cache lines that are separate from the lines cache that are used to cache data from one or more supporting stores.
[5]
5. Computing device according to claim 1, characterized by the fact that making the data to be recorded in the trace comprises writing the data in a temporary storage, and in which downloading data from the temporary storage to the trace file is deferred based on memory bus activity.
[6]
6. Computing device according to claim 1, characterized by the fact that the stored control logic also configures the computing device to register at least a cache removal by reference to a group and a mode in a associative cache.
[7]
7. Computing device according to claim 1, characterized by the fact that the recorded data comprise transitions between different CCP states.
[8]
8. Computing device according to claim 1, characterized by the fact that the recorded data comprise at least one of: a transition from a writing state to a reading state, a transition from a writing state to a writing state, or a transition from a reading state to a writing state.
[9]
9. Computing device according to claim 1, characterized by the fact that using the CCP to identify that there are data to be recorded on a trace comprises understanding that a transition from a reading state to a state of reading does not need to be recorded in the trace.
[10]
10. Computing device according to claim 1, characterized by the fact that the data for each processing unit is recorded in at least one separate data stream.
[11]
11. Computing device according to claim 1, characterized by the fact that data for two or more processing units are registered in the same data stream, but identified with a processing unit identifier.
[12]
12. Computing device according to claim 1, characterized by the fact that the data to be recorded in the tracking comprise ordering information.
[13]
13. Computing device according to claim 1, characterized by the fact that the data to be recorded comprises data written in the specific cache line by an enclave, and in which to cause the data to be recorded in the trace - understanding understands: when the operation that caused the interaction between the specific cache line and the one or more supporting stores corresponds to a thread that interacts with the enclave, cause the data to be recorded in a data stream. tracking that corresponds to chaining; or when the operation that caused the interaction between the specific cache line and the one or more supporting stores corresponds to the enclave, cause the data to be recorded to be separated from the tracking data stream that corresponds to the thread .
[14]
14. Method implemented in a computing environment, characterized by including a plurality of processing units and a cache memory that comprises a plurality of cache lines that are used to cache data from one or more supporting stores and that are shared by the plurality of processing units, in which consistency between data in the plurality of cache lines and the one or more supporting stores is managed according to a cache coherence protocol, the method for executing a cache-based trace recording using cache coherence protocol (CCP) data, the method comprising: determining that at least the following conditions have been met: (i) an operation caused an interaction between a specific cache line of the plurality of cache lines and the one or more supporting stores; (ii) registration is enabled for a processing unit specific to the plurality of processing units that caused the operation; (iii) the specific cache line is a participant in the registration; and (iv) the CCP indicates that there is data to be recorded in a trace based on the operation; and based on at least the determination that the conditions have been met, make the data recorded in the trace, the data usable to reproduce the operation.
[15]
15. Computer program product for use in a computing device, characterized by the fact that it comprises a plurality of processing units and a cache memory that comprises a plurality of cache lines that are used to cache data from a computer. or more support stores and that are shared by the plurality of processing units, in which the consistency between data in the plurality of cache lines and the one or more support stores is managed according to a protocol of cache coherence (CCP), the product of the computer program comprising a computer-readable medium that has stored in it computer-executable instructions that are executable by one or more processing units to make the computing device run at least the following: determine that at least the following conditions have been met: (i) an operation caused an interaction between a line of ca che specific to the plurality of cache lines and the one or more supporting stores; (ii) registration is enabled for a processing unit specific to the plurality of processing units that caused the operation; (iii) the specific cache line is a participant in the registration; and (iv) the CCP indicates that there is data to be recorded in a trace based on the operation; and based on at least the determination that the conditions have been met, make the data recorded in the trace, the data usable to reproduce the operation.
类似技术:
公开号 | 公开日 | 专利标题
BR112020003342A2|2020-08-18|cache-based trace recording using cache coherence protocol data
US9424200B2|2016-08-23|Continuous run-time integrity checking for virtual memory
US9990237B2|2018-06-05|Lockless write tracking
JP2021515312A|2021-06-17|Trace recording by logging inflows into the lower layer cache based on entries in the upper layer cache
CN103699498A|2014-04-02|Application key data protection system and protection method
BR112020014668A2|2020-12-01|log on-demand cache inflows to a top-level cache
KR20190108109A|2019-09-23|Implementation of atomic primitives using cache line locking
BR112020023084A2|2021-02-02|cache-based trace playback breakpoints using reserved tag field bits
US10558572B2|2020-02-11|Decoupling trace data streams using cache coherence protocol data
US20120143838A1|2012-06-07|Hierarchical software locking
Chen et al.2019|PD-DM: An efficient locality-preserving block device mapper with plausible deniability.
WO2020057394A1|2020-03-26|Method and device for monitoring memory access behavior of sample process
KR20210079266A|2021-06-29|Parameter signatures for realm security configuration parameters
KR20210075064A|2021-06-22|Trust Mediator Realm
CN107463513B|2021-01-12|System and method for transferring control between storage locations
Alwadi2020|High Performance and Secure Execution Environments for Emerging Architectures
WO2017020194A1|2017-02-09|File system protection method, device and storage apparatus
CN112099905A|2020-12-18|Method and device for acquiring dirty pages of virtual machine, electronic equipment and readable storage medium
WO2021225896A1|2021-11-11|Memory page markings as logging cues for processor-based execution tracing
TW201926061A|2019-07-01|Scrub - commit state for memory region
同族专利:
公开号 | 公开日
AU2018334370A1|2020-02-20|
WO2019055094A1|2019-03-21|
CA3072872A1|2019-03-21|
PH12020550109A1|2020-12-07|
CL2020000645A1|2020-09-11|
US10459824B2|2019-10-29|
RU2020113601A3|2022-01-31|
US20190087305A1|2019-03-21|
IL272745D0|2020-04-30|
RU2020113601A|2021-10-20|
CN111095222A|2020-05-01|
KR20200056430A|2020-05-22|
EP3665575A1|2020-06-17|
CO2020002932A2|2020-04-13|
SG11202001913RA|2020-04-29|
ZA202001262B|2021-05-26|
ES2887195T3|2021-12-22|
EP3665575B1|2021-07-28|
JP2020534589A|2020-11-26|
引用文献:
公开号 | 申请日 | 公开日 | 申请人 | 专利标题

US4598364A|1983-06-29|1986-07-01|International Business Machines Corporation|Efficient trace method adaptable to multiprocessors|
AU3776793A|1992-02-27|1993-09-13|Intel Corporation|Dynamic flow instruction cache memory|
US5905855A|1997-02-28|1999-05-18|Transmeta Corporation|Method and apparatus for correcting errors in computer systems|
US6009270A|1997-04-08|1999-12-28|Advanced Micro Devices, Inc.|Trace synchronization in a processor|
US6167536A|1997-04-08|2000-12-26|Advanced Micro Devices, Inc.|Trace cache for a microprocessor-based device|
US6094729A|1997-04-08|2000-07-25|Advanced Micro Devices, Inc.|Debug interface including a compact trace record storage|
US5944841A|1997-04-15|1999-08-31|Advanced Micro Devices, Inc.|Microprocessor with built-in instruction tracing capability|
US6101524A|1997-10-23|2000-08-08|International Business Machines Corporation|Deterministic replay of multithreaded applications|
US6553564B1|1997-12-12|2003-04-22|International Business Machines Corporation|Process and system for merging trace data for primarily interpreted methods|
US6351844B1|1998-11-05|2002-02-26|Hewlett-Packard Company|Method for selecting active code traces for translation in a caching dynamic translator|
US6854108B1|2000-05-11|2005-02-08|International Business Machines Corporation|Method and apparatus for deterministic replay of java multithreaded programs on multiprocessors|
US7448025B2|2000-12-29|2008-11-04|Intel Corporation|Qualification of event detection by thread ID and thread privilege level|
US6634011B1|2001-02-15|2003-10-14|Silicon Graphics, Inc.|Method and apparatus for recording program execution in a microprocessor based integrated circuit|
US20020144101A1|2001-03-30|2002-10-03|Hong Wang|Caching DAG traces|
US7181728B1|2001-04-30|2007-02-20|Mips Technologies, Inc.|User controlled trace records|
US7178133B1|2001-04-30|2007-02-13|Mips Technologies, Inc.|Trace control based on a characteristic of a processor's operating state|
US7185234B1|2001-04-30|2007-02-27|Mips Technologies, Inc.|Trace control from hardware and software|
US20030079205A1|2001-10-22|2003-04-24|Takeshi Miyao|System and method for managing operating systems|
US7051239B2|2001-12-28|2006-05-23|Hewlett-Packard Development Company, L.P.|Method and apparatus for efficiently implementing trace and/or logic analysis mechanisms on a processor chip|
US7089400B1|2002-08-29|2006-08-08|Advanced Micro Devices, Inc.|Data speculation based on stack-relative addressing patterns|
US20040117690A1|2002-12-13|2004-06-17|Andersson Anders J.|Method and apparatus for using a hardware disk controller for storing processor execution trace information on a storage device|
US20040139305A1|2003-01-09|2004-07-15|International Business Machines Corporation|Hardware-enabled instruction tracing|
US7526757B2|2004-01-14|2009-04-28|International Business Machines Corporation|Method and apparatus for maintaining performance monitoring structures in a page table for use in monitoring performance of a computer program|
US20050223364A1|2004-03-30|2005-10-06|Peri Ramesh V|Method and apparatus to compact trace in a trace buffer|
US8010337B2|2004-09-22|2011-08-30|Microsoft Corporation|Predicting database system performance|
US7447946B2|2004-11-05|2008-11-04|Arm Limited|Storage of trace data within a data processing apparatus|
JP4114879B2|2005-01-21|2008-07-09|インターナショナル・ビジネス・マシーンズ・コーポレーション|Trace information collection system, trace information collection method, and trace information collection program|
US7640539B2|2005-04-12|2009-12-29|International Business Machines Corporation|Instruction profiling using multiple metrics|
US8301868B2|2005-09-23|2012-10-30|Intel Corporation|System to profile and optimize user software in a managed run-time environment|
US7877630B1|2005-09-28|2011-01-25|Oracle America, Inc.|Trace based rollback of a speculatively updated cache|
US7984281B2|2005-10-18|2011-07-19|Qualcomm Incorporated|Shared interrupt controller for a multi-threaded processor|
US9268666B2|2005-10-21|2016-02-23|Undo Ltd.|System and method for debugging of computer programs|
US7620938B2|2005-10-31|2009-11-17|Microsoft Corporation|Compressed program recording|
US20070106827A1|2005-11-08|2007-05-10|Boatright Bryan D|Centralized interrupt controller|
US7461209B2|2005-12-06|2008-12-02|International Business Machines Corporation|Transient cache storage with discard function for disposable data|
US20070150881A1|2005-12-22|2007-06-28|Motorola, Inc.|Method and system for run-time cache logging|
US20070220361A1|2006-02-03|2007-09-20|International Business Machines Corporation|Method and apparatus for guaranteeing memory bandwidth for trace data|
US7958497B1|2006-06-07|2011-06-07|Replay Solutions, Inc.|State synchronization in recording and replaying computer programs|
US7676632B2|2006-07-18|2010-03-09|Via Technologies, Inc.|Partial cache way locking|
US7472218B2|2006-09-08|2008-12-30|International Business Machines Corporation|Assisted trace facility to improve CPU cache performance|
US20080250207A1|2006-11-14|2008-10-09|Davis Gordon T|Design structure for cache maintenance|
US20080114964A1|2006-11-14|2008-05-15|Davis Gordon T|Apparatus and Method for Cache Maintenance|
US8370806B2|2006-11-15|2013-02-05|Qualcomm Incorporated|Non-intrusive, thread-selective, debugging method and system for a multi-thread digital signal processor|
US8261130B2|2007-03-02|2012-09-04|Infineon Technologies Ag|Program code trace signature|
US8484516B2|2007-04-11|2013-07-09|Qualcomm Incorporated|Inter-thread trace alignment method and system for a multi-threaded processor|
US20090037886A1|2007-07-30|2009-02-05|Mips Technologies, Inc.|Apparatus and method for evaluating a free-running trace stream|
CN101446909B|2007-11-30|2011-12-28|国际商业机器公司|Method and system for managing task events|
US8078807B2|2007-12-27|2011-12-13|Intel Corporation|Accelerating software lookups by using buffered or ephemeral stores|
US8413122B2|2009-02-12|2013-04-02|International Business Machines Corporation|System and method for demonstrating the correctness of an execution trace in concurrent processing environments|
US8402318B2|2009-03-24|2013-03-19|The Trustees Of Columbia University In The City Of New York|Systems and methods for recording and replaying application execution|
US8589629B2|2009-03-27|2013-11-19|Advanced Micro Devices, Inc.|Method for way allocation and way locking in a cache|
US8140903B2|2009-04-16|2012-03-20|International Business Machines Corporation|Hardware process trace facility|
US8423965B2|2009-06-23|2013-04-16|Microsoft Corporation|Tracing of data flow|
JP2011013867A|2009-06-30|2011-01-20|Panasonic Corp|Data processor and performance evaluation analysis system|
US8719796B2|2010-01-26|2014-05-06|The Board Of Trustees Of The University Of Illinois|Parametric trace slicing|
US8468501B2|2010-04-21|2013-06-18|International Business Machines Corporation|Partial recording of a computer program execution for replay|
US9015441B2|2010-04-30|2015-04-21|Microsoft Technology Licensing, Llc|Memory usage scanning|
US8499200B2|2010-05-24|2013-07-30|Ncr Corporation|Managing code-tracing data|
US20120042212A1|2010-08-10|2012-02-16|Gilbert Laurenti|Mixed Mode Processor Tracing|
US9645913B2|2011-08-03|2017-05-09|Daniel Geist|Method and apparatus for debugging programs|
US20130055033A1|2011-08-22|2013-02-28|International Business Machines Corporation|Hardware-assisted program trace collection with selectable call-signature capture|
US8584110B2|2011-09-30|2013-11-12|International Business Machines Corporation|Execution trace truncation|
US8612650B1|2012-03-13|2013-12-17|Western Digital Technologies, Inc.|Virtual extension of buffer to reduce buffer overflow during tracing|
WO2013147898A1|2012-03-30|2013-10-03|Intel Corporation|Tracing mechanism for recording shared memory interleavings on multi-core processors|
US9304863B2|2013-03-15|2016-04-05|International Business Machines Corporation|Transactions for checkpointing and reverse execution|
US9058415B1|2013-03-15|2015-06-16|Google Inc.|Counting events using hardware performance counters and annotated instructions|
US9189360B2|2013-06-15|2015-11-17|Intel Corporation|Processor that records tracing data in non contiguous system memory slices|
US9086974B2|2013-09-26|2015-07-21|International Business Machines Corporation|Centralized management of high-contention cache lines in multi-processor computing environments|
US9336110B2|2014-01-29|2016-05-10|Red Hat, Inc.|Identifying performance limiting internode data sharing on NUMA platforms|
US9535815B2|2014-06-04|2017-01-03|Nvidia Corporation|System, method, and computer program product for collecting execution statistics for graphics processing unit workloads|
US9300320B2|2014-06-27|2016-03-29|Qualcomm Incorporated|System and method for dictionary-based cache-line level code compression for on-chip memories using gradual bit removal|
US9875173B2|2014-06-30|2018-01-23|Microsoft Technology Licensing, Llc|Time travel debugging in managed runtime|
US9361228B2|2014-08-05|2016-06-07|Qualcomm Incorporated|Cache line compaction of compressed data segments|
US9588870B2|2015-04-06|2017-03-07|Microsoft Technology Licensing, Llc|Time travel debugging for browser components|
EP3338192A1|2015-08-18|2018-06-27|Telefonaktiebolaget LM Ericsson |Method for observing software execution, debug host and debug target|
US9767237B2|2015-11-13|2017-09-19|Mentor Graphics Corporation|Target capture and replay in emulation|
US9569338B1|2015-12-02|2017-02-14|International Business Machines Corporation|Fingerprint-initiated trace extraction|
US10031834B2|2016-08-31|2018-07-24|Microsoft Technology Licensing, Llc|Cache-based tracing for time travel debugging and analysis|
US10031833B2|2016-08-31|2018-07-24|Microsoft Technology Licensing, Llc|Cache-based tracing for time travel debugging and analysis|
US10489273B2|2016-10-20|2019-11-26|Microsoft Technology Licensing, Llc|Reuse of a related thread's cache while recording a trace file of code execution|
US10310977B2|2016-10-20|2019-06-04|Microsoft Technology Licensing, Llc|Facilitating recording a trace file of code execution using a processor cache|
US10324851B2|2016-10-20|2019-06-18|Microsoft Technology Licensing, Llc|Facilitating recording a trace file of code execution using way-locking in a set-associative processor cache|
US10445211B2|2017-08-28|2019-10-15|Microsoft Technology Licensing, Llc|Logging trace data for program code execution at an instruction level|US9378560B2|2011-06-17|2016-06-28|Advanced Micro Devices, Inc.|Real time on-chip texture decompression using shader processors|
US10031834B2|2016-08-31|2018-07-24|Microsoft Technology Licensing, Llc|Cache-based tracing for time travel debugging and analysis|
US11048615B2|2018-01-08|2021-06-29|Ozcode Ltd.|Time travel source code debugger incorporating visual annotations|
US10558572B2|2018-01-16|2020-02-11|Microsoft Technology Licensing, Llc|Decoupling trace data streams using cache coherence protocol data|
US10541042B2|2018-04-23|2020-01-21|Microsoft Technology Licensing, Llc|Level-crossing memory trace inspection queries|
US10592396B2|2018-04-23|2020-03-17|Microsoft Technology Licensing, Llc|Memory validity states in time-travel debugging|
US11126537B2|2019-05-02|2021-09-21|Microsoft Technology Licensing, Llc|Coprocessor-based logging for time travel debugging|
LU101769B1|2020-05-05|2021-11-08|Microsoft Technology Licensing Llc|Omitting processor-based logging of separately obtainable memory values during program tracing|
LU101768B1|2020-05-05|2021-11-05|Microsoft Technology Licensing Llc|Recording a cache coherency protocol trace for use with a separate memory value trace|
LU101770B1|2020-05-05|2021-11-05|Microsoft Technology Licensing Llc|Memory page markings as logging cues for processor-based execution tracing|
LU101767B1|2020-05-05|2021-11-05|Microsoft Technology Licensing Llc|Recording a memory value trace for use with a separate cache coherency protocol trace|
法律状态:
2021-11-03| B350| Update of information on the portal [chapter 15.35 patent gazette]|
优先权:
申请号 | 申请日 | 专利标题
US201762559780P| true| 2017-09-18|2017-09-18|
US62/559,780|2017-09-18|
US15/915,930|2018-03-08|
US15/915,930|US10459824B2|2017-09-18|2018-03-08|Cache-based trace recording using cache coherence protocol data|
PCT/US2018/038875|WO2019055094A1|2017-09-18|2018-06-22|Cache-based trace recording using cache coherence protocol data|
[返回顶部]